Oozie Installation and Configuration

Basic Setup

Environment Setup

Oozie Server Setup

Setting Up Oozie with an Alternate Tomcat

Database Configuration

Oozie Configuration

Oozie Configuration Properties

Logging Configuration

Oozie User Authentication Configuration

Oozie Hadoop Authentication Configuration

ser ProxyUser Configuration

User Authorization Configuration

Oozie System ID Configuration

Filesystem Configuration

HCatalog Configuration

Notifications Configuration

Setting Up Oozie with HTTPS (SSL)

To use a Self-Signed Certificate

To use a Certificate from a Certificate Authority

Configure the Oozie Server to use SSL (HTTPS)

Configure the Oozie Client to connect using SSL (HTTPS)

Connect to the Oozie Web UI using SSL (HTTPS)

Additional considerations for Oozie HA with SSL

Fine Tuning an Oozie Server

Using Metrics instead of Instrumentation

High Availability (HA)

Pre-requisites

Installation/Configuration Steps

Security

JobId sequence

Starting and Stopping Oozie

Oozie Command Line Installation

Oozie Share Lib

Oozie Coordinators/Bundles Processing Timezone

MapReduce Workflow Uber Jars

Advanced/Custom Environment Settings

Basic Setup

参照 Oozie Quick Start .

Environment Setup

Oozie 忽略任何OOZIE_HOME设置,其自动计算; 在conf/oozie-env.sh文件中配置环境变量:

 CATALINA_OPTS:内置Tomcat运行设置,指定这个属性,没有默认值;

 OOZIE_CONFIG_FILE:加载配置文件,默认值 `oozie-site.xml`;

 OOZIE_LOGS:Oozie日志目录,默认 `logs/`

 OOZIE_LOG4J_FILE:Oozie Log4J配置文件,默认值 ` oozie-log4j.properties `

 OOZIE_LOG4J_RELOAD:周期重新加载 Log4J配置,默认 10

 OOZIE_HTTP_PORT:Oozie服务器运行端口,默认 11000

 OOZIE_ADMIN_PORT :Oozie 管理员端口,默认 11001

 OOZIE_HTTP_HOSTNAME:Oozie服务器运行的主机名,默认值为 hostname -f的输出值

 OOZIE_BASE_URL:action回调给Oozie基本URL,默认值 `http://${OOZIE_HTTP_HOSTNAME}:${OOZIE_HTTP_PORT}/oozie`

 OOZIE_CHECK_OWNER:设置为 `true`, Oozie  setup/start/run/stop 脚本将坚持Oozie安装目录是否匹配用户调用这脚本,默认值未定义,并翻译为 `false`

如果Oozie配置使用 HTTPS (SSL),这需要使用如下环境变量:

 OOZIE_HTTPS_PORT:Oozie运行时,使用HTTPS端口,默认是  11443

 OOZIE_HTTPS_KEYSTORE_FILE :密钥文件存放位置,默认值为  `${HOME}/.keystore`

 OOZIE_INSTANCE_ID:Oozie服务器实例ID,使用HA时,灭个服务器都有唯一实例ID,默认值 ` ${OOZIE_HTTP_HOSTNAME}`

Oozie Server Setup

oozie-setup.sh脚本用来启动内置 Tomcat服务器来运行Oozie; 脚本选项有:

Usage  : oozie-setup.sh <OPTIONS>"
         prepare-war [-d directory] [-secure] (-d identifies an alternative directory for processing jars"
            -secure             配置 war文件用于HTTPS(SSL))"
         sharelib create -fs FS_URI [-locallib SHARED_LIBRARY] (创建sharelib,"
                  FS_URI is the fs.default.name"
                  for hdfs uri; SHARED_LIBRARY, path to the"
                  Oozie sharelib to install, it can be a tarball"
                  or an expanded version of it. If ommited,"
                  the Oozie sharelib tarball from the Oozie"
                  installation directory will be used)"
                  (action failes if sharelib is already installed"
                  in HDFS)"
         sharelib upgrade -fs FS_URI [-locallib SHARED_LIBRARY] ([deprecated][use create command to create new version]
                   upgrade existing sharelib, fails if there"
                   is no existing sharelib installed in HDFS)"
         db create|upgrade|postupgrade -run [-sqlfile <FILE>] (create, upgrade or postupgrade oozie db with an"
                   optional sql file)"
         (without options prints usage information)"

如果目录libext/在安装目录下,oozie-setup.sh脚本将包含该目录下的所有JAR文件在Oozie WAR文件中.

如果 ExtJS ZIP 文件在libext/,同样将被添加到WAR中,这个 ExtJS 库将被命名为 ext-2.2.zip

启动Oozie使用切换 Tomcat

使用addtowar.sh脚本准备Oozie服务器,如果Oozie运行在不同servlet容器中时,该脚本将添加 Hadoop JARs,JDBC JARS和 ExtJS库给Oozie WAR文件,可用选项有:

 Usage  : addtowar <OPTIONS>
 Options: -inputwar INPUT_OOZIE_WAR
          -outputwar OUTPUT_OOZIE_WAR
          [-hadoop HADOOP_VERSION HADOOP_PATH]
          [-extjs EXTJS_PATH]
          [-jars JARS_PATH] (multiple JAR path separated by ':')
          [-secureWeb WEB_XML_PATH] (path to secure web.xml)

原始 oozie.war 文件在 Oozie服务器安装目录,当Hadoop JARs 和the ExtJS库添加到 oozie.war文件 ,Oozie准备运行;删除任何先前的部署容器的oozie.war,如果使用 Tomcat,复制准备好的 oozie.war到Tomcat目录 webapps/;

Oozie实例只能部署在先前的 Tomcat实例中;

Database Configuration

Oozie可用使用 HSQL, Derby, MySQL, Oracle, PostgreSQL or SQL Server数据库,默认使用内置 Derby数据库;Oozie绑定 JDBC启动 给HSQL,内置Derby和 PostgreSQL;

HSQL是常规测试用例作为内存数据库,所有数据在Oozie停止后消失;

如果是Derby,MySQL, Oracle, PostgreSQL, or SQL Server数据库,Oozie数据库 schema必须创建使用 ooziedb.sh命令行工具;

如果使用MySQL, Oracle, or SQL Server,相应的JDBC驱动Jar文件必须复制到 libext/目录下并且必须添加到 Oozie WAR文件中使用bin/addtowar.shoozie-setup.sh脚本 -jars选项;

SQL 数据库用于Oozie配置使用下面配置属性:

oozie.db.schema.name=oozie
oozie.service.JPAService.create.db.schema=false
oozie.service.JPAService.validate.db.connection=false
oozie.service.JPAService.jdbc.driver=org.apache.derby.jdbc.EmbeddedDriver
oozie.service.JPAService.jdbc.url=jdbc:derby:${oozie.data.dir}/${oozie.db.schema.name}-db;create=true
oozie.service.JPAService.jdbc.username=sa
oozie.service.JPAService.jdbc.password=
oozie.service.JPAService.pool.max.active.conn=10

如果 oozie.db.schema.create属性 设置为 true (默认false),Oozie 表将自动创建不需要使用 ooziedb命令行工具,仅在开发模式建议设置这个属性 为 true,

注意: 如果使用oozie.db.schema.create属性设置为 true, oozie.service.JPAService.validate.db.connection属性将被忽视并且Oozie处理设置为false;

oozie-site.xml配置数据库配置,执行ooziedb.sh命令行工具创建数据库: $ bin/ooziedb.sh create -sqlfile oozie.sql -runValidate DB Connection.

如果使用 MySQL, Oracle, or SQL Server数据库,配置相应JDBC驱动jar到 libext/目录,在运行 ooziedb.sh命令行工具时; 如果使用 -run选项, -sqlfile选项将被使用,所有数据库的改变都会写入指定文件,并且数据库不会修改;

如果使用HSQL,没有必要使用 ooziedb命令工具,在 oozie-site.xml使用下面配置:

  oozie.db.schema.name=oozie
  oozie.service.JPAService.create.db.schema=true
  oozie.service.JPAService.validate.db.connection=false
  oozie.service.JPAService.jdbc.driver=org.hsqldb.jdbcDriver
  oozie.service.JPAService.jdbc.url=jdbc:hsqldb:mem:${oozie.db.schema.name}
  oozie.service.JPAService.jdbc.username=sa
  oozie.service.JPAService.jdbc.password=
  oozie.service.JPAService.pool.max.active.conn=10

Oozie Configuration

Oozie配置从 conf/ 目录下读取,包含如下3个文件:

oozie-site.xml : Oozie server configuration
oozie-log4j.properties : Oozie logging configuration
adminusers.txt : Oozie admin users list

Oozie Configuration Properties

所有属性都在 oozie-default.xml 文件中定义,Oozie分解 配置属性使用如下顺序:

Java System property-> oozie-site.xml -> oozie-default.xml

Logging Configuration

默认 oozie-log4j.properties 配置文件,如果修改了,将自动加载;默认 在 logs/目录下,Oozie有4种不同 日志文件:

oozie.log: web services log streaming works from this log
oozie-ops.log: messages for Admin/Operations to monitor
oozie-instrumentation.log: intrumentation data, every 60 seconds (configurable)
oozie-audit.log: audit messages, workflow jobs changes

Oozie User Authentication Configuration

Oozie支持 Kerberos HTTP SPNEGO 身份验证,pseudo/simple 和anonymous 用于客户端访问;

如果 Pseudo/simple or Kerberos HTTP SPNEGO 身份验证使用,之后的请求将携带 HTTP Cookie; Oozie使用 Apache Hadoop-Auth (Java HTTP SPENGO) 库用于验证,这个库可以拓展支持其它验证机制;验证配置:

oozie.authentication.type=simple             --simple | kerberos | #AUTHENTICATION_HANDLER_CLASSNAME#
oozie.authentication.token.validity=36000    --有效时间
oozie.authentication.signature.secret=       --未设置,则启动自动创建  
oozie.authentication.cookie.domain=          --HTTP cookie 存储的 domain
oozie.authentication.simple.anonymous.allowed=true    --是否允许匿名用户
oozie.authentication.kerberos.principal=HTTP/localhost@${local.realm}
                                            --主要用于HTTP/localhost
oozie.authentication.kerberos.keytab=${oozie.service.HadoopAccessorService.keytab.file}
                                            --keytab 文件位置

Oozie Hadoop Authentication Configuration

Oozie Hadoop验证配置:

oozie.service.HadoopAccessorService.kerberos.enabled=false  --启用 设置true
-- 如下配置必须配置正确
local.realm=LOCALHOST
oozie.service.HadoopAccessorService.keytab.file=${user.home}/oozie.keytab
oozie.service.HadoopAccessorService.kerberos.principal=${user.name}/localhost@{local.realm}

User ProxyUser Configuration

Oozie支持代理用户操作,但提供如下限制能力:1.代理用户是一个明确配置为每个代理用户基础;2.代理用户可以被限制到一系列主机;3.代理用户被限制到非本组;

配置代理用户: oozie.service.ProxyUserService.proxyuser.#USER#.hosts:#USER#可以扮演其它用户访问的主机列表 oozie.service.ProxyUserService.proxyuser.#USER#.groups:#USER#扮演其它用户所属组列表

User Authorization(授权) Configuration

基本授权模型:1.读取所有作业;2.写自己作业的访问;3.写ACL控制的作业;4.读取管理员操作;5.管理员写所有作业权限;6.管理员访问管理员操作;

如果security 关闭,所有用户都作为 管理员用户;配置安全模式: oozie.service.AuthorizationService.security.enabled=false

旧的 ACL模型,仅支持oozie-site.xml的下面配置: oozie.service.AuthorizationService.default.group.as.acl=true

管理员用户由 oozie.service.AuthorizationService.admin.groups属性决定,如果这个属性没有设置,那么管理员用户在conf/adminusers.txt文件中指定,语法为:1. 一个用户一行;2.空行和 #开头行将被忽略;

Oozie System ID Configuration

默认Oozie系统ID为:oozie.system.id=oozie-${user.name}

Filesystem Configuration

配置文件系统在 oozie-site.xml: 可用value有 hdfs, hftp, webhdfs, viewfs,*

<property>
    <name>oozie.service.HadoopAccessorService.supported.filesystems</name>
    <value>hdfs</value>
</property>

HCatalog Configuration

参见:Oozie HCatalog Integration

添加 HCatalog jars到 Oozie war文件中:

oozie.service.HCatAccessorService.hcat.configuration /local/filesystem/path/to/hive-site.xml

配置HCatalog URLHandling:

<property>
    <name>oozie.service.URIHandlerService.uri.handlers</name>
    <value>org.apache.oozie.dependency.FSURIHandler,org.apache.oozie.dependency.HCatURIHandler</value>
</property>

《未完成》

Notifications Configuration

Oozie支持发布通知给一个JMS提供者用于作业状态改变和SLA遇见和丢失事件;

Message Broker Installation 发送和接受消息,JMS broker需要安装,可用使用 Aapache ActiveMQ;

Services 添加或修改 oozie-site.xml中的属性:

 <property>
    <name>oozie.services.ext</name>
    <value>
        org.apache.oozie.service.JMSAccessorService,
        org.apache.oozie.service.JMSTopicService,
        org.apache.oozie.service.EventHandlerService,
        org.apache.oozie.sla.service.SLAService
    </value>
 </property>

Event Handlers

 <property>
    <name>oozie.service.EventHandlerService.event.listeners</name>
    <value>
        org.apache.oozie.jms.JMSJobEventListener,
        org.apache.oozie.sla.listener.SLAJobEventListener,
        org.apache.oozie.jms.JMSSLAEventListener,
        org.apache.oozie.sla.listener.SLAEmailEventListener
    </value>
 </property>

Setting Up Oozie with HTTPS (SSL)

To use a Self-Signed Certificate

To use a Certificate from a Certificate Authority

Configure the Oozie Server to use SSL (HTTPS)

Configure the Oozie Client to connect using SSL (HTTPS)

Connect to the Oozie Web UI using SSL (HTTPS)

Additional considerations for Oozie HA with SSL

Fine Tuning an Oozie Server

Using Metrics instead of Instrumentation

High Availability (HA)

Pre-requisites

Installation/Configuration Steps

Security

JobId sequence

Starting and Stopping Oozie

Oozie Command Line Installation

Oozie Share Lib

Oozie Coordinators/Bundles Processing Timezone

MapReduce Workflow Uber Jars

Advanced/Custom Environment Settings